14. Goals of Data Partitioning

12 Goals Of Data Partitioning -

Why Data Partitioning?

Pipelines designed to work with partitioned data fail more gracefully. Smaller datasets, smaller time periods, and related concepts are easier to debug than big datasets, large time periods, and unrelated concepts. Partitioning makes debugging and rerunning failed tasks much simpler. It also enables easier redos of work, reducing cost and time.

Another great thing about Airflow is that if your data is partitioned appropriately, your tasks will naturally have fewer dependencies on each other. Because of this, Airflow will be able to parallelize execution of your DAGs to produce your results even faster.

Types of partitioning

What are four common types of data partitioning?

SOLUTION:
  • Location
  • Logical
  • Size
  • Time

Logical partitioning

Logical Partitioning is the process of…

SOLUTION: Breaking conceptually related data into discrete groups for processing

Time Partitioning

Time Partitioning is the process of…

SOLUTION: Processing data based on a schedule or when it was created

Size Partitioning

Size Partitioning is the process of

SOLUTION: Separating data for processing based on desired or required storage limits